Discovery of Phylogenetic Relevant Y-chromosome Variants in 1000 Genomes Project Data

نویسندگان

  • Chuan-Chao Wang
  • Hui Li
چکیده

Current Y chromosome research is limited in the poor resolution of Y chromosome phylogenetic tree. Entirely sequenced Y chromosomes in numerous human individuals have only recently become available by the advent of next-generation sequencing technology. The 1000 Genomes Project has sequenced Y chromosomes from more than 1000 males. Here, we analyzed 1000 Genomes Project Y chromosome data of 1269 individuals and discovered about 25,000 phylogenetic relevant SNPs. Those new markers are useful in the phylogeny of Y chromosome and will lead to an increased phylogenetic resolution for many Y chromosome studies. Introduction The paternally inherited Y chromosome has been widely used in anthropology and population genetics to understand origin and migration of human populations. With a very low mutation rate on the order of 3.0x10 mutations/nucleotide/generation, single nucleotide polymorphisms (SNPs) of Y chromosome have been used in constructing a phylogenetic tree linking all the Y chromosome lineages from world populations . Since the middle of 1980s, Y-specific probes had been isolated from cosmid libraries and used in association with a set of restriction enzymes to search for male specific restriction fragment length polymorphisms (RFLPs). Since the late 1990s, denaturing high-performance liquid chromatography (DHPLC) method has been used to detect the SNPs in the single-copy regions of MSY . During the last ten years, a robust genealogical tree of human Y chromosomes based on about three thousand stable SNPs has been built, permitting inference of human population demographic history . However, current Y chromosome research is still limited in the poor resolution for some specific Y chromosome branches, such as haplogroup C-M130, D-M174, N-M231, O-M175, H-M69, and L-M11. Despite the huge population of those haplogroups, there have been fewer markers defined in those haplogroups than in haplogroups R and E. For instance, three Y-SNP markers, 002611, M134 and M117, represent about 260 million people in East Asia, but downstream markers are far from enough to reveal informative genetic substructures of those populations. Entirely sequenced Y chromosomes in numerous human individuals have only recently become available by the advent of next-generation sequencing technology 15, 16, . For instance, the 1000 Genomes Project has sequenced Y chromosomes from more than 1000 males. Here, we analyzed 1000 Genomes Project Y chromosome data of 1269 individuals and discovered thousands of new SNPs that might be useful in the phylogeny of Y chromosome. Those new markers will lead to an increased phylogenetic resolution for many Y chromosome studies. Materials and Methods The phylogenetic tree was based on ISOGG at 6 September 2013 (http://www.isogg.org/). SAMtools (version 0.1.9) view was used to download mapped bam files from publicly accessible FTP sites at the European Bioinformatics Institute (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/) and the National Center for Biotechnology Information (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/). Reads that were uniquely mapped on Y chromosome with a quality ≥ 15 were extracted from sam files and transformed into bam files with SAMtools. Duplicates were removed by samtools rmdup. Variations were called by SAMtools mpileup. The resulting BCF file was then converted into VCF format by using the bcftools. Haplogroups were classified by using the WHY.pl and AMY-tree.pl scripts. To evaluate the accuracy of haplogroup assignment, maximum likelihood haplogroup trees using the HKY85 model were produced by PhyML (version 20120412), and bootstrap values were produced using 100 subsamplings. Heterozygous calls and calls with phred-scaled quality <30 were removed in constructing the trees. Taken the tree topology into consideration, the VCF files were opened in MS Excel for visual identification of potential phylogenetic relevant SNPs. Novel variants were filtered by verifying that all other haplogroup control samples bore the ancestral allele, and by identifying at least two samples in the case haplogroup that carried the same

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovery of Western European R1b1a2 Y Chromosome Variants in 1000 Genomes Project Data: An Online Community Approach

The authors have used an online community approach, and tools that were readily available via the Internet, to discover genealogically and therefore phylogenetically relevant Y-chromosome polymorphisms within core haplogroup R1b1a2-L11/S127 (rs9786076). Presented here is the analysis of 135 unrelated L11 derived samples from the 1000 Genomes Project. We were able to discover new variants and bu...

متن کامل

Generation of high-resolution a priori Y-chromosome phylogenies using “next-generation” sequencing data

An approach for generating high-resolution a priori maximum parsimony Y-chromosome (“chrY”) phylogenies based on SNP and small INDEL variant data from massively-parallel short-read (“next-generation”) sequencing data is described; the tree-generation methodology produces annotations localizing mutations to individual branches of the tree, along with indications of mutation placement uncertainty...

متن کامل

I-49: Human Y Chromosome ProteomeProject

The success of the Human Genome Project (HGP) has provided a blueprint for the approximately 20,000 gene-encoded proteins potentially active in all of the hundreds of cell types that make up the human body. Yet we still have limited knowledge about a majority of the gene-encoded proteins which are the “building blocks of life” and “cellular machinery”. It is estimated that for nearly half of th...

متن کامل

A comparison of cataloged variation between International HapMap Consortium and 1000 Genomes Project data

BACKGROUND Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advan...

متن کامل

I-3: Human Y Chromosome Proteome Project 2012 Update

The Human Genome Project has generated a blueprint for the approximately 20,300 gene-encoded proteins potentially active in any of 230 cell types that make up the human body (human proteome). However, based on the UniProtKB/Swiss-Prot database content, about 6000 of at the protein level; for many others, there is very little information related to protein function, abundance, subcellular locali...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013